Today we will…
Functions allow you to automate common tasks
We’ve been using functions since Day 1
Did you ever find yourself copy-pasting an analysis and changing small parts?
Writing your OWN functions has 3 big advantages over copy-and-paste:
Your code is easier to read
To change your analysis, simply change the function
No more mistakes from copy-paste
Let’s call the function!
add_two <-The name of the function is chosen by the author.
The argument(s) of the function are chosen by the author.
We can supply a default argument value – something
{ body }The body of the function is where the action happens.
return()Your function will “give back” whatever would normally “print out”.
return()sadd_something <- function(x, something) {
if(is.numeric(x) != TRUE){
stop("Please provide a numeric input for the x argument.")
}
return(x + something)
}
add_something(x = "statistics", something = 5)Error in add_something(x = "statistics", something = 5): Please provide a numeric input for the x argument.
If an object doesn’t exist in the function’s environment, the global environment will be searched next; if there is no object in the global environment, the program will error out.
Objects you make in the function don’t affect “real life”.
This is an example of name masking, where names defined inside of a function mask names defined outside of a function.
Interactive coding (highlight small lines within your function to run them independent of the rest)
print() Debugging
Rubber Ducking
In general…
Write a simple example once (without a function)
Generalize by assigning variables.
Write into a function.
Call the function on desired arguments
find_car_make()Write a function called find_car_make() that takes as input the name of a car, and returns only the “make”, or the company that created the car. For example, find_car_make("Toyota Camry") should return “Toyota” and find_car_make("Ford Anglica") should return “Ford”.
Consider mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Let’s use our function to create a new column in the data called make that gives the make of each car!
rownames_to_column() ❤️mtcars |>
rownames_to_column("make_model") |>
mutate(make = find_car_make(make_model),
.after = make_model
) |>
head(n = 3) make_model make mpg cyl disp hp drat wt qsec vs am gear carb
1 Mazda RX4 Mazda 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag Mazda 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 Datsun 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Challenge 7: Incorporating Multiple Inputs
Could our function be more efficient?
Note
Notice how I relied on the existing function std_to_01() inside the new function, for clarity!
Functions that use unquoted variable names as arguments are called nonstandard evaluation or tidy evaluation.
library(rlang)In February 2020 rlang introduced the “injection” {{ }} operator to simplify writing functions around tidyverse pipelines.
With the {{ }} operator you can inject the name of data-variables (i.e. columns from the data frames) into function arguments!
Warning
This only works for select() type functions, that use a literal (tidy) name of the variable to subset the data.
mutate() defuses the R code it was supplied.body_mass_g = standardize(body_mass_g).This is why we need injection!
std_column_01 <- function(data, variable) {
stopifnot(is.data.frame(data))
data <- data
mutate({{ variable }} = std_to_01( {{ variable }})
)
data
}Error: <text>:6:27: unexpected '='
5: data <- data
6: mutate({{ variable }} =
^
Danger
Oh no! What happened?
The left hand side of = is also diffused!
:=The “walrus operator” := is an alias of = that forces operations on the left hand side to not be diffused.
You can use it to supply names, e.g. a := b is equivalent to a = b.
While you could use this in an “ordinary” mutate() it is not necessary!
across()What if I want to modify multiple columns?
Without inspection:
Observations are “missing completely at random”
With information about the “missingness”:
Observations are “missing at random”
Look for patterns!
If fish length measurements are missing at random, conditional on month, year, and river section,
then the distributions of lengths will be similar for fish of the same month, year, and river section.
Why Scale?
Easier to compare across variables
Easier to model (standardizes variance)
Why not Scale?
Article on How Building Functions with Variable Names has Changed Over the Years